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1.   INTRODUCTION 

The  rapid  advances  in  computer  technology,  in 
particular,  anticipated  advances  generated  by  LSI,  have 
led  us  to  research  on  parallel  processing.   Although  the 
importance  of  the  problem  has  been  pointed  out  [9],  very 
little  is  known  about  how  parallel  processing  can  be  used 
for  a  class  of  real  programs.   The  introduction  of  a  class 
of  parallel  processing  machines,  such  as  the  CDC  6600  [10] 
or  the  Illiac  IV  [11],  will  allow  parallel  processing  of 
arithmetic  operations.   In  the  trend  of  computer  systems 
toward  a  high  degree  of  parallelism,  special  arithmetic 
units,  e.g.,  a  logarithm  unit  and  an  exponent  unit,  and 
much  sophisticated  philosophy  of  machine  organization  seem 
to  be  important.   As  computer  architecture  becomes  more 
advanced,  the  problem  of  efficiency  of  parallel  processing 
will  fall  on  compilers.   While  existing  detection  algorithms 
of  potential  program  parallelism  seem  applicable  generally 
to  different  classes  of  programs,  their  implementation  must 
depend  on  what  conventional  languages  these  programs  are 
written  in.   An  experiment  not  only  to  measure  potential 
parallelism  in  FORTRAN  programs,  but  also  to  modify  them 
if  necessary  to  extract  more  parallelism  is  in  process.   An 
analyzer  has  been  programmed  in  PL/1,  and  the  tree  height 
reduction  algorithm  for  parallel  processing  of  blocks  of 
assignment  statements  is  included  as  one  part  of  it.   In 


order  to  analyze  a  program  structure,  to  measure  a  program, 
and  to  compute  a  certain  kind  of  efficiency  for  parallel 
processing  on  a  set  of  real  FORTRAN  programs ,  we  assume  that 
a  set  of  sufficient  processing  elements  are  available.   We 
also  assume  that  binary  arithmetic  operations,  such  as 
addition,  subtraction,  multiplication,  and  division,  are 
performed  in  these  processing  elements,  and  that  each 
operation  takes  the  same  execution  time.   This  thesis  consists 
of  four  sections.   The  first  section  is  the  introduction.   In 
the  second  section  the  effects  of  four  factors  on  parallelism 
within  an  arithmetic  expression  are  described.   The  third 
section  describes  the  distribution  algorithm  and  the  back 
substitution  and  recursion  algorithm.   The  description  of 
implementation  of  the  tree  height  reduction  algorithm  is  in 
section  four. 


2.   RELATIONS  OF  ASSOCIATION  AND  COMMUTATION,  REDUNDANT 
PARENTHESES,  DISTRIBUTION,  BACK  SUBSTITUTION  AND 
RECURSION  TO  PARALLEL  MACHINES 

This  section  describes  several  factors  which  have 
effects  on  a  single  assignment  statement  (i.e.,  an  arithmetic 
expression)  or  block  of  assignment  statements  for  parallel 
processing.   These  factors  are  the  associative  law  and  the 
commutative  law,  redundant  parentheses,  the  distributive 
law,  and  back  substitution  and  recursion.   The  first  three 
factors  are  associated  with  a  single  assignment  statement; 
the  fourth  factor  is  associated  with  a  block  of  assignment 
statements.   The  execution  order  of  an  arithmetic  expression 
can  be  represented  by  a  syntactic  tree.   We  define  tree  height 
to  be  the  number  of  levels  to  compute  an  arithmetic  expression 
when  all  the  operations  on  the  same  level  are  assumed  to  take 
the  same  execution  time.   By  proper  handling  of  the  four 
factors  the  number  of  execution  steps  (in  the  sense  of  tree 
height)  for  an  arithmetic  expression  or  block  of  assignment 
statements  can  often  be  reduced. 

2. 1   Associative  Law  and  Commutative  Law 

As  to  an  arithmetic  expression,  most  of  the  commonly 
used  compilation  technigues  result  in  structures  that  must 
be  evaluated  serially.   When  parallel  processing  is  considered, 
the  intrinsic  parallelism  within  an  arithmetic  expression  can 


be  recognized  by  the  use  of  the  associative  law  and  the 
commutative  law.   Several  papers  have  investigated  these  and 
invented  algorithms  to  build  a  syntactic  tree  to  achieve  this 
goal  [1].   The  tree  is  such  that  all  operations  shown  at  the 
same  level  can  be  performed  in  parallel  provided  there  is  a 
sufficient  number  of  processing  elements.   The  execution 
time,  in  the  sense  of  tree  height  is  claimed  to  be  the  best 
for  Baer's  and  Bovet  ■  s  algorithm  [2].   By  the  use  of  the 
associative  law,  for  example,  the  expression 

B  +  C  +  D  +  E 

can  be  translated  into  the  structures  for  serial  computation 
and  parallel  computation,  respectively,  shown  in  Fig.  1.1(a) 
and  Fig.  1.1(b). 


Fig.  1.1   Tree  structure  for  serial  and  parallel 
computation  by  the  use  of  association 

In  Fig.  1.1(a)  the  operations  are  performed  sequentially. 

It  takes  three  steps  of  time  to  evaluate  the  expression.   In 

Fig.  1.1(b)  the  operations  at  the  same  level  are  performed 


concurrently.  It  takes  two  steps  of  time.  This  means  that 
the  expression  can  be  evaluated  in  fewer  execution  steps  by 
parallel  processing. 

By  the  nature  of  the  commutative  law  the  order  of 
operands  to  be  combined  by  the  commutative  operators,  such 
as  '+'  or  '*',  can  be  exhanged  for  appropriate  association. 
The  combination  of  commutation  and  association  will  result 
in  a  structure  well  suited  for  parallel  computation.   For 
example,  the  expression 

B  +  C  *  D  +  E 

can  be  translated  into  the  structure  shown  as  in  Fig.  1.2(b) 
compared  with  the  structure  shown  as  in  Fig.  1.2(a)  which 
is  evaluated  serially. 


^K 


(C  *  D  +  (B  +  E) 
(b) 


Fig.  1.2   Tree  structure  for  serial  and  parallel 
computation  by  the  use  of  commutation 
and  association 


Subtraction  and  division.  Since  the  subtract  operator 
'-'  and  the  divide  operator  '/'  are  binary  but  not  commutative 
operators,  the  commutative  law  and  the  associative  law  applied 


to  addition  and  multiplication  cannot  be  used  without 
modifications.   Subtraction  can  be  introduced  in  an  arith- 
metic expression  and  handled  as  addition.   The  only  difference 
is  that  it  is  necessary  to  change  operators  when  the  associa- 
tion is  done.   The  expression  B  +  C  -  D  -  E  will  be  translated 
into  the  structure  ((B+O-  (D+E)).   Unary  minus  operations 
may  be  handled  similarly. 

The  divide  operator  '/'  has  a  similar  nature  to  the 
subtract  operator  by  assuming  that  all  binary  operations 
take  the  same  execution  time.   A  divide  operator  preceded 
by  another  divide  operator  will  be  changed  into  a  multiply 
operator  to  associate  two  operands  into  a  binary  product. 
The  expression  B  *  C  /  D  /  E  will  be  mapped  into  the  structure 
( (B  *  C)  /  (D  *  E) ) . 
2.2   Redundant  Parentheses 

The  existance  of  parenthesis  within  an  arithmetic 
expression  affects  the  execution  order.   It  also  affects  the 
tree  height  of  an  expression.   For  the  sake  of  obtaining  the 
minimum  tree  height  in  parsing  an  arithmetic  expression,  and 
for  easy  implementation,  the  given  form  of  an  expression 
should  be  transformed  into  standard  form  by  removing  any 
redundant  parentheses  first  in  order  to  expose  parallelism 
more  explicitly.   For  example,  the  arithmetic  expression 
B  +  (C  +  (D  +  E) )  has  the  standard  form  B  +  C  +  D  +  E.   As 
subtraction  and  division  are  introduced,  suitable  handling 


of  operators  must  be  done.   The  expression  A/(B*C*D)+ 
H  -  (E  *  F  -  G)  will  be  transformed  into  standard  form 
A/B/C/D+H-E*F+G.   If  we  build  syntatic  trees 
for  both  by  the  use  of  the  commutative  and  the  associative 
law,  a  lower  tree  height  will  be  obtained  for  the  expression 
in  standard  form  than  that  for  the  expression  in  presented 
form.   The  syntactic  trees  are  shown  in  Fig.  1.3(a)  and 
Fig.  1.3(b)  respectively. 


A  \ 

(A/  B)  /  (C  *  D)  +H  +  G  -  E  *  F 


(b) 


Fig.  1.3   Tree  structures  for  parallel  computation 
before  and  after  the  removal  of  redundant 
parentheses 


2. 3   Distributive  Law 

The  distributive  law  has  been  investigated  and  an 

algorithm  has  been  developed  by  Y.  Muraoka  [3].   The  distri- 

* 
bution  algorithm   describes  two  important  properties  in  an 

arithmetic  expression,  holes  and  spaces.   The  tree  height  for 

an  arithmetic  expression  obtained  by  Baers  and  Bovet ' s  algorithm 

can  be  reduced  by  distributing  multiplications  over  additions 

to  fill  holes  and  spaces.   Here  are  two  examples.   The  first 


*  The  distribution  algorithm  will  be  described  in  Section  3. 
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is  hole-filling  distribution,  the  second  is  space-filling 
distribution.   Given  the  expression 

B*  (C  *  D  *  E  +  F)  , 

Fig.  1.4(a)  represents  the  tree  structure  without  distribution 
and  Fig.  1.4(b)  represents  the  tree  structure  with  distribution. 
The  tree  height  for  the  latter  is  one  less  than  that  for  the 
former.   In  the  second  example,  given  an  arithmetic  expression 

B*  (C*D+E)+F. 

The  variable  B  is  distributed  over  (C  *  D  +  E)  to  accommodate 
the  space.   The  tree  height  in  Fig.  1.5(b)  is  reduced  when 
compared  with  that  in  Fig.  1.5(a). 


* 

B*  (C  *  D  *  E  +  F) 
(a) 


Fig.  1.4   Tree  structures  for  parallel  computation 
before  and  after  distribution 


+  F 


Fig.  1.5   Tree  structures  for  parallel  computation 
before  and  after  distribution 


2.4   Back  Substitution  and  Recursion 

A  block  of  assignment  statements  can  be  regarded  as 
a  block  of  function  equations  by  defining  variables  to  the 
left  of  equal  signs  as  output  variables,  and  variables  to  the 
right  of  equal  signs  as  input  variables.   The  original  block 
of  assignment  statements  can  be  transformed  into  a  block  in 
which  each  output  variable  appears  on  the  left  hand  side  of 
only  one  assignment  statement  by  performing  all  possible 
substitutions  of  one  statement  into  another  which  follows 
it  in  the  original  block.   As  a  consequence  more  parallel 
processable  tasks  can  be  obtained  and  the  total  number  of 
execution  steps  might  be  reduced  at  the  loss  of  an  increase 
of  the  number  of  processing  elements.   Given  a  block  of 
assignment  statements  as  shown  below 


X  =  f  (A,B,C) 
Y  =  f  (X,D). 
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Assume  that  the  first  statement  takes  two  steps  and  the 
second  one  takes  one  step;  at  each  step  one  processing  element 
is  used.   The  initiation  of  the  second  must  be  at  the  termina- 
tion of  the  first.   If  we  substitute  the  first  into  the  second 
the  sequential  relation  between  them  can  be  changed  into  a 
parallel  relation,  i.e.,  they  are  independent  of  each  other 
and  can  be  executed  simultaneously.   The  total  number  of  steps 
to  compute  X  and  Y  can  be  reduced  from  3  to  2. 

In  DO- loop  blocks  certain  kinds  of  recurrence  computa- 
tions exist  in  addition  to  an  amount  of  potential  parallelism 
[4]  which  is  not  taken  care  of  here.   Many  examples  are  single 
recurrence  statements,  such  as 

Yi  =  f(Yi-l'A)  • 
Some  are  cross  recurrence  statements,  such  as 

xi  =  fi(xi-i'Yi-i'A) 

Yi  =  f2(Xi-l'Yi-l'B>  • 

These  kinds  of  recurrence  computations  can  be  performed  by 
procedures  which  allow  parallel  operation.   By  the  application 
of  iterative  back  substitution  the  recurrence  statements  can 
be  handled  as  a  set  of  parallel  processable  statements.   The 


Parallelism  in  program  loops  has  been  studied  by  Y.  Muraoka 
[3].  Implementation  has  been  done  by  S .  C.  Chen  f6"|.  Also 
see  [4] . 
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other  three  factors  could  be  applied  to  each  of  them.   The 
number  of  execution  steps  required  to  finish  series  of  re- 
peated operations  could  be  reduced  only  by  the  assumption 
that  sufficient  processing  elements  are  available. 
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3.   DISTRIBUTION  ALGORITHM,  BACK  SUBSTITUTION 
AND  RECURSION  ALGORITHM 

3. 1   Distribution  Algorithm 

Several  algorithms  exist  for  recognition  of  paral- 
lelism within  an  arithmetic  expression  [1].   Most  concern 
intrinsic  parallelism  using  the  associative  law  and  the 
commutative  law  within  an  arithmetic  expression.   An  im- 
portant feature  being  overlooked  is  the  possibility  of 
reducing  computation  time  (in  the  sense  of  tree  height) 
by  distribution.   For  example,  compare  parallel  computation 
of  two  equivalent  expressions 

n 

2   A±X1 
i=o 

and  A   +  X  (A,  +  X  (A0  +   ...   +  (A   .  +  A  X)   ...  ). 

o        1        I  n-1     n 

If  we  apply  Baer's  and  Bovet ' s  algorithm  to  evaluate  them 
respectively,  we  will  get  2  log,,  (n  +  1)  steps  for  the 
former  and  2n  steps  for  the  latter.   Thus  it  is  desirable 
to  have  the  form  of  the  former  rather  than  that  of  the 
latter.   This  is  the  contribution  that  the  distribution 
algorithm  makes  in  the  tree  height  reduction  algorithm. 
The  distribution  algorithm  has  been  programmed  in  PL/1  and 
tested,  and  will  be  described  in  the  next  section. 
Several  definitions  closely  related  to  the  implementation 
algorithm  are  described  here. 
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3.1.1   General  Description 

In  order  to  symbolize  an  arithmetic  expression  in 
production  form,  a  simplified  structure  of  arithmetic  ex- 
pressions can  be  defined  as: 

1.  A  —  (E) 

2.  A  -H-  V 

3.  E  -v  T 

4.  E  —  E  +  T 

5.  T  —  A 

6.  T  -♦  T  *  A 

where  A:   arithmetic  expression 

E:   arithmetic  expression 

T:   term 

V:   variable. 
Subtraction  and  division  behave  in  a  similar  manner  to 
addition  and  multiplication,  as  far  as  the  analysis  is  con- 
cerned.  In  the  development  of  our  implementation,  sub- 
traction and  division  have  been  covered  using  the  assumption 
that  all  binary  operations  take  the  same  execution  time. 
Definition  1:   An  arithmetic  expression,  A,  consists  of 
addition,  multiplication  and  parenthesis  pairs.   We  as- 
sume that  addition  and  multiplication  take  the  same 
execution  time. 

Definition  2:   The  level,  L,  of  a  parenthesis  pair  in  an 
arithmetic  expression  is  defined  as  follows: 
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We  start  scanning  an  arithmetic  expression  from 
left  to  right  and  set  the  count  of  parenthesis  pairs,  C,  to 
zero.   Each  time  as  a  left  parenthesis,  (,  is  encountered 
we  add  1  to  count  C.   As   a  right,  parenthesis,),  is  encoun- 
tered we  subtract  1  from  count  C.   During  the  process  the 
maximum  number  of  the  count,  D,  associated  with  a  left  pa- 
renthesis is  called  the  depth  of  that  parenthesis  pair. 
The  level  of  that  parenthesis  pair  is  obtained  as 
L  =  D  -  C  +  1.   An  arithmetic  expression  enclosed  by  a 
level  L  parenthesis  pair  is  called  a  level  L  arithmetic 
expression,  A  .   For  convenience  any  arithmetic  expression 
free  of  parenthesis  pairs  is  assumed  to  be  a  level  1 
expression. 
Example: 

A  =  1B  *  2(C  *  D  +  E)2  +  F  *  2(G  +  H)2  1 


level  1 
2 


2  depth 

1 


Definition  3:   The  minimum  tree  height.  h[A],  denotes  the 
minimum  number  of  levels  of  a  syntactic  tree  for  an  arith- 
metic expression  A  among  all  possible  trees  for  A  as  it  is 
presented  [1].   The  minimum  tree  height  for  an  arithmetic 
expression  can  also  be  computed,  without  generating  a 
syntactic  tree,  as  follows: 
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(3.1)  If   A  =   2  V.f  then    h[A]  =  log2[m]2   . 

i=l 

n        n 

(3.2)  If   A  =   ZT.  or  r    (T.),  then 

i=l  x     i=l 

n 
h[A]  =  log2[  2  e[T  ]]2. 
i=l 

m        n  n 

(3.3)  If  A  =  t  V   *  T    (T  ),  then  h[A]  =  log?[m  +  z    e[T  ]] 

i=l  x    j=l  3  j=I    D   z 

For  example,  the  tree  for  arithmetic  expression 

A+B+C+D*E    *   F    +   G   +  H  has    4    levels   by  Baer's    and 

Bovet ' s  algorithm  [2],  and  5  levels  by  Hellerman's  algorithm 

[5],   The  minimum  tree  height  by  definition  for  this  expression 

should  be  4. 

Definition  4:   The  effective  length  of  an  arithmetic 

expression,  e[A],  is  defined  as 

(4)  e[A]  =  2h[A]. 

Computations  are  given  to  illustrate  how  h[A]  and  e[A]  of 

an    arithmetic   expression   A=B+C+D+E*F*G+H+I 

can  be   obtained. 

We    apply    (3,1)    to   each   term   in   A   to   obtain 

h[B]    =   h[C]    =    h[D]    =    h[H]    =    h[I]    =    0,       h[E*F*G]    =    2. 

By  definition    4  we   obtain 

e[B]    =    e[C]    =    e[D]    =    e[H]    =    e[I]    =    1,       e[E*F*G]    =    4. 


Multiplication  and  division  had  a   similar  situation,    e.g.    if 

3  3 

T   V.    =    A    *   B    *    C    or      7T    V.     =    A    *    B/C  ,    then  h[A]    =  2. 
i=l   x  i=l    x 


**  r 


k 


[m]2    is   defined   to   be    the   power    of    2,   2  ,    such 
2k-1<  [a]     <2k   e.g.       [5]2   =    8. 


that 
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Now  apply  (3,2),  (4)  to  A  we  can  obtain 

6 
h[A]  =  log2[  2  e[T±]]2  =   log2  [1  +1+1+4+1+  1]2= 
i=l 

log2[9]2  =  log2(16)  =  4. 

e[A]  =  2h[A]  =  24  =  16. 

Definition  5:   The  multiplicative  length  of  an  arithmetic 

expression,  m[A],  is  defined  as  follows: 

m 

(5.1)  If  A  =   ttV.,  then  m[A]  =  m. 

i=l  x 

n 

(5.2)  If  A  =   ST.,  then  m[A]  =  e[A]. 

i=l  X 

m  m 

(5.3)  If  A  =   ir  (T.),  then  m[A]  =  .2  e[T.  ]. 

i=l   x  i=l 

m       n  n 

(5.4)  If  A  =  .7T.V-;  *  ir   e[Ti]T  then  m[A]  =  m  +  2  e[T-j]. 

1=1      j=l    J  j  =  l    J 

Note  the  difference  between  e[A]  and  m[A] . 

Example  1:   A=B+C+D*E 

i 
h[A]  =  2,  e[A]  =  4,  m[A]  =  e[A]  =  4. 

Example  2:   A=B*(C+D) 

h[A]  =  2,  e[A]  =  4,  m[A]  =1+2=3. 

Definition  6 :   An  occupied  node  in  a  tree  for  an  arithmetic 

expression  is  defined  to  be  a  node  where  an  arithmetic 

operation  is  to  be  done.   A  free  node  is  defined  to  be  a 

node  where  no  arithmetic  operation  needs  to  be  done.   The 

root  node  is  defined  to  be  the  node  at  the  top  of  a  tree. 
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free  node 


occupied 
node 


root  node 


occupied  node 
.free  node 


3.1.2   Holes 

Definition  7:   The  hole-function  H  associated  with  an  arith- 
metic expression  A  is  defined  in  terms  of  an  absolute  hole 
function  HA  and  a  relative  hole  function  for  each  term 
which  are  defined  as  follows: 

m 

(7.1)  If  A  =  w  V . ,  then  H[A]  =  H.[A]  =  e[A]  -  m[A], 

i=l  x 

n 

(7.2)  If  A  =  jJjTj,  then 

H[A]  =  HA[A]  =  min(HA[Tj]  +  HR[A#T  ]) 

J 

H  rA  T.]  =  2h[A]  "  n[TJ]*  -eCT,] 

R   '  j  J  J 

n 

7r  (T-i),  then 
3  =  1 


(7.3)   If  A  = 


H[A]  =  HA[A]  =  j(HA[T  ]  +  Hj^CA,^]) 

HD[AT.]  =  2h[A]  "  n[Tj]  -  e[T.] 
R   '  jJ  J        3 

The  computation  of  holes  of  an  expression  is  illustrated  as 


*  n[T. ]  represents  the  number  of  occupied  nodes  when  we 
traverse  the  tree  for  A  from  the  root  node  of  the  subtree 
for  T.  up  to  the  root  node  of  A. 
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follows: 

Given  an  arithmetic  expression  A  =  B  *  (C  +  D). 

Let  T±    =  B,  T2  =  C,  T3  =  D,  T4  =  C  +  D. 

3.1)  we  have  h[T2]  =  0,  h[T3]  =  0. 
4)  we  have  e[T2]  =  1,  e[T3]  =  1. 

3.2)  we  have  h[T4]  =  log2[l  +  1]2  =  1. 


For  T  :   By 


By 
By 
By 
By 


For  T  :   By 


hi 


For  A: 


By 

By 

By 
By 


4)  we  have  e[T4]  =  2. 


5,2)  we  have  m[T4]  =  e[T4]  =  2. 
3,1),  (4)  and  (5,1)  we  have 


[T1]  =  0,  e[T1]  =  1,  mCT^]  =  1. 


7,1)  we  have  ^[T^]  =  e[T±2    -  m[T1]  =  0. 

3,3),  h[A]  =  log2[l  +  2]2  =  2. 

4),  e[A]  =  22  =  4. 

7,3),  HR[A,T1]  =  22"1  -  e[T1]  =  21  -  1  =  1. 

HR[A,T2]  =  22"1  -  e[T4]  =  21  -  2  =  0. 


Thus  H_[A]  =  2  (H.[T.]  +  HD[A,T.])  =  (0  +  1)  +  (0  +  0 )  =  1 
A      J        A  j  R   '  3 


3.1.3   Spaces 

Given  an  arithmetic  expression  of  the  following  form 
A  =  (E)  *  T   +  T 

and  the  conditions  e[T  ]<e[E]  and  h[(E)  *  T,]>h[T  ]  are 


satisfied,  where 
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E  =  ST. 
l  l 

T,  is  the  term  to  be  distributed  over  (E) 
a 

T   is  the  term  to  accomodate  the  space  as  T,  is 
s  ^        d 

distributed  over  (E). 

Let  Y  =  e[(E)  *  T^]  +  e[T  ].   Since  log2[Y]2  =  h[A]  and 

log2([Y]2  /2)  =  h[A]  -  1.   We  let  Y'  =  Y  -  [Yj2/2. 

If  the  effective  length  of  A  can  be  reduced  by  Y1,  the 

tree  height  of  A  can  also  be  reduced.   So  the  objective 

is  to  find  out  if  there  are  enough  spaces  to  accomodate  Y' 

after  T,  is  distributed  over  (E). 
d 

Definition  8:   The  space  function  S  in  an  arithmetic 

expression  E  =  ?T.  with  respect  to  a  term  T,  is  defined 

as  S  =  SR[E,Td]  =  e[(E)  *  Td]  -  |e[T.,  *  Td]. 

The  computation  of  spaces  is  illustrated  as  follows: 

Given  A=B*(C*D+G)+F. 

Let  E  =  C  *  D  +  G,  then  we  have 

Y  =  e[B  *  (E)]  +  e[F]  =8+1=9. 

Y'  =  Y  -  [Y]2/2  =  9  -   -~-   =  1. 
Let  E'=B*C*D+B*G,  then  we  have 

e[B*C*D]  =  4,  e[B*G]  =  2. 
Then  S  [E,B]  =  8-(4  +  2)  =  8  -  6  =  2. 
Since  S_[E,B]  =  2>Y'  =  1,   this  means  space-filling 
distribution  should  be  done.   The  tree  height  for 
A   =  B  *c*D  +  B  *  G  +F  is  lower  than  that  for  A. 
See  Fig.  1. 5. 


20 
3. 2   Back  Substitution  and  Recursion  Algorithm 

As  for  a  block  of  assignment  statements,  the  implicit 
detection  of  parallel  processable  tasks  within  it  as  organ- 
ized for  serial  computation  can  be  done  by  the  use  of 
Bernstein's  condition  [4]  to  build  a  connectivity  matrix. 
By  the  FORK  and  JOIN  technique  to  analyze  the  connectivity 
matrix,  the  precedence  partitions  of  tasks  can  be  obtained. 
Accordingly,  the  execution  ordering  of  tasks  within  a  block 
is  determined.   As  described  earlier  a  block  of  assignment 
statements  can  be  regarded  as  a  block  of  function  equations 
by  defining  variables  on  the  left-hand  side  of  equal  signs 
as  output  variables,  and  variables  on  the  right-hand  side 
of  equal  signs  as  input  variables.   We  substitute  the  right- 
hand  side  of  a  statement  whose  left-hand  side  appears  on  the 
right-hand  side  of  a  subsequent  assignment  statement.   The 
original  block  of  assignment  statements  can  be  transformed 
into  a  block  in  which  with  each  output  variable  only  one 
assignment  statement  is  associated.   The  sequential  rela- 
tion between  a  set  of  statements  will  be  changed  into  a 
parallel  relation.   More  parallel  processable  tasks  and 
fewer  precedence  partitions  will  be  obtained.   The  dis- 
tribution algorithm  can  be  applied  to  each  of  the  resulting 
assignment  statements,  and  the  tree  height  for  each  state- 
ment can  be  reduced  as  much  as  possible.   By  comparing  the 
number  of  execution  steps  of  the  original  block  with  the 
number  of  execution  steps  of  the  block  obtained  after  back 
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substitution  we  can  choose  the  lower  tree  height  as  our 
result.   If  the  block  of  assignment  statements  is  outside 
a  DO  statement,  back  substitution  alone  can  do  this  job. 
If  the  block  of  statements  is  inside  a  DO  statement ,  both 
back  substitution  and  recursion  are  needed. 

3.2.1   Physical  Description 

In  a  presented  FORTRAN  application  program  assign- 
ment statements  are  executed  sequentially  for  a  uniprocessor 
machine.   One  operation  can  be  done  at  one  time  regardless 
of  the  relations  among  them.   As  far  as  parallel  processing 
is  concerned,  it  seems  reasonable  to  consider  one  block  of 
assignment  statements  at  a  time  instead  of  a  single  assign- 
ment statement  based  on  the  fact  that  parallelism  within 
statements  themselves  can  exist  at  the  same  time.   The 
block  of  assignment  statements  in  consideration  here  is 
assumed  to  be  generated  to  satisfy  the  criterion  that  the 
block  has  no  branch  and  transfer  statements  in  it. 

One  goal  is  to  try  to  reduce  the  total  number  of 
execution  steps  of  the  original  block  which  is  sequentially 
organized.   In  other  words,  we  try  to  change  sequential 
relation  into  parallel  relation  by  back  substitution  and 
recursion  in  order  that  more  parallel  tasks  can  be  obtained 
and  the  number  of  execution  steps  for  that  block  can  be 
reduced  for  parallel  processing.   Hence,  it  is  desired  to 
discover  the  statements  in  the  original  block  which  can 
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be  executed  in  parallel  and  which  must  be  completed  before 
the  start  of  the  others.   Also,  it  is  necessary  to  determine 
the  statements  in  the  block  obtained  after  back  substitution 
of  sequential  and  parallel  relations.   If  we  do  back  sub- 
stitution whenever  the  situation  exists,  the  sequential 
relations  among  statements  are  able  to  be  changed  into  a 
parallel  relation,  i.e.,  all  statements  obtained  after 
back  substitution  can  be  executed  in  parallel.   Unfortunately 
in  this  way  the  total  number  of  execution  steps  for  a  block 
is  not  always  reduced.   Sometimes  we  don't  get  the  advantage 
of  reducing  the  number  of  execution  steps,  but  the  dis- 
advantage of  increasing  the  number  of  processing  elements 
required.   Thus  both  results  of  the  original  block  and  the 
transformed  block  should  be  compared  to  determine  which 
one  is  more  suited  for  parallel  processing. 

Block  outside  DO  statement.   By  referring  to  four 
categories  of  classification  of  memory  location  in  Bernstein's 
paper  [4], 

1)  The  location  is  only  fetched  during  the  execution  of  a 
block 

2)  The  location  is  only  stored  during  the  execution  of  a 
block 

3)  The  first  operation  involving  this  location  is  a  fetch. 
One  of  the  succeeding  operations  of  a  block  stores  into 
this  location. 

4)  The  first  operation  involving  this  location  is  a  store. 
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One  of  the  succeeding  operations  of  a  block  fetches 

this  location. 

We  see  that  back  substitution  has  nothing  to  do  with 
the  memory  locations  falling  into  the  first  three  categories 
except  the  fourth  category.   For  those  memory  locations  the 
first  operation  involving  it  is  a  store  and  one  of  the  suc- 
ceeding operations  is  a  fetch.   Hence  back  substitution  can 
be  done.   That  is,  the  right-hand  side  of  the  egual  sign  of 
the  previous  statement  is  substituted  into  the  right-hand 
side  of  the  succeeding  one.   The  entry  '1'  must  be  entered 
into  the  connectivity  matrix  of  the  original  block,  the 
entry  '0'  must  be  entered  into  the  connectivity  matrix 
of  the  transformed  block.   With  respect  to  the  next  state- 
ment in  a  seguential  process,  if  the  situation  is  that  the 
common  location  to  be  stored  by  two  preceding  statements, 
only  the  second  statement  modifies  the  common  location 
by  the  application  of  back  substitution.   The  first  state- 
ment of  them  will  be  deleted  from  the  block  obtained  after 
back  substitution.   In  the  transformed  block,  even  a  memory 
location  still  falling  into  category  4,  both  statements 
can  be  executed  in  parallel  without  conflict  under  the 
assumption  that  all  input  variables  are  fetched  before 
execution  for  parallel  processing. 

Block  inside  DO  statement.   In  a  block  inside  a  DO 
statement  some  types  of  recurrence  operations  are  considered. 
They  are  single  recurrence  statements  and  cross  recurrence 
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statements.   The  occurrence  of  these  types  of  recurrence 
is  due  to  the  status  of  memory  locations  falling  into 
categories  3  and  4.   These  types  of  recurrence  operations 
can  be  performed  by  procedures  which  allow  parallel  opera- 
tions.  We  apply  back  substitution  repeatedly  as  the  value 
of  index  variable  requires.   Theoretically,  recurrence 
statements  may  be  handled  as  a  set  of  parallel  processable 
statements,  as  many  as  the  number  of  iteration.   Actually, 
two  problems  occur.   One  is  if  the  number  of  iteration 
happens  to  be  large,  the  other  is  if  the  value  of  the 
ending  variable  is  unknown.   In  the  first  case  a  reasonable 
number  of  iterations  are  performed.   At  this  stage  the 
conservative  point  of  view  of  multiplying  the  number  of 
execution  steps  (in  the  sense  of  tree  height) ,  obtained 
after  applying  Hu ' s  algorithm  [7]  to  a  set  of  independent 
tree  graphs,  by  the  ratio  of  the  computed  number  of  itera- 
tions required  to  the  revised  number  of  iterations  done  in 
parallel.   Alternatively  a  reasonable  approximation  formula 
may  be  developed.   In  the  second  case  a  reasonable  number 
will  be  taken  to  solve  this  difficulty. 

As  a  recurrence  statement  in  a  single  DO  statement 
is  considered,  the  procedure  of  iterative  operations  of 
back  substitution  will  be  taken  regardless  of  whether  the 
output  variable  is  regarded  as  a  single  variable  or  sub- 
scripted variable.   But  the  measurement  for  parallel 
processing  will  be  different.   For  the  first  example 
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DO  10   1=1,  10 

10   A(I+1)  =  A(I)  +  B 
After  the  iterative  operation  of  back  substitution,  the 
statement  will  have  the  form  at  the  tenth  iteration 

A(  )  =  A(  )+B+B+B+B+B+B+B+B+B+B 
The  number  of  levels  and  the  number  of  operations  for 
each  level  will  be  obtained  by  the  application  of  the  dis- 
tribution algorithm.   By  using  this  information  Hu's  algorithm 
might  lead  to  better  results  for  parallel  processing,  i.e., 
the  minimum  number  of  PEs  is  reguired  within  the  time  limit 
of  the  tree  height.   For  the  second  example, 
DO  10   1=1,  10 

10   A(I)  =  A(I)  +  I 
After  iterative  operations  of  back  substitution,  a  set  of 
ten  statements  will  be  obtained.   These  ten  elements  of 
array  A  can  be  computed  simultaneously.   At  this  point 
there  is  no  information  about  whether  all  ten  elements 
of  array  A  or  only  the  tenth  element  will  be  referenced  in 
some  later  tasks.   The  conservative  assumption  of  all  ten 
elements  of  array  A  to  be  referenced  must  be  taken.   Hu's 
algorithm  should  be  applied  to  the  set  of  ten  parallel 
processable  statements  by  combining  them  into  a  big  tree 
to  obtain  the  number  of  levels  of  this  tree  and  the  minimum 
number  of  PEs  reguired.   As  the  cross  recurrence  statements 
in  a  single  DO  statement  are  considered,  similar  procedures 
to  those  used  for  the  second  example  above  can  be  used.   In 
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addition,  each  time  one  block  of  such  recurrence  operations 
is  going  to  be  taken  care  of,  only  one  level  of  DO  statement 
will  be  referenced  no  matter  if  it  is  an  inner  or  outer  DO 
statement. 

3.2.2.   Discussion 

As  we  know,  to  analyze  and  to  measure  a  real  FORTRAN 
application  program  for  computation  of  efficiency  for 
parallel  processing  is  not  a  simple  matter.   The  scheme  in 
this  algorithm  is  designed  for  the  preliminary  stage 
experiments. 

Several  cases  are  not  handled  in  the  best  way.   In  a 
FORTRAN  program  the  transcendental  functions  and  arithmetic 
functions  usually  appear  in  assignment  statements.   As  for 
the  transcendental  function,  it  seems  to  be  most  proper 
to  store  the  information  of  the  tree  for  the  assignment 
statement.   This  is  the  number  of  levels  and  the  number  of 
operations  for  each  level.   Each  time  the  transcendental 
function  is  found,  the  information  will  be  referenced. 
Two  problems  exist  in  it.   One  is  the  information  may  be 
different  in  accordance  with  how  the  subprogram  for  the 
transcendental  function  is  written.   Another  is  the  time 
reguired  for  the  execution  of  the  transcendental  function 
is  a  great  number  of  times  that  of  the  other  operations. 
The  remedy  being  used,  for  simplicity,  is  that  each  time 
a  transcendental  function  is  encountered,  we  replace  it 
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by  an  expression  consisting  of  a  sum  of  terms.   This  ex- 
pression is  made  of  the  function  name  twice  and  the  argument 
once.   Examples  are  illustrated  as  follows: 
Example  1:   The  assignment  statement 

SUM1  =  C  *  DC0S(SUM2) 
will  have  the  form 

SUM1  =  C  *  (DCOS  +  DCOS  +  SUM2). 
Example  2:   The  assignment  statement 

Y  =  X  +  SIN(A) 
will  have  the  form 

Y  =  X  +  SIN  +  SIN  +  A. 

The  operator  of  exponentiation  is  desired  to  be  handled 
in  a  similar  way.   The  assignment  statement 

RZ  =  X  **2  +  Y  **2 
will  have  the  form 

RZ  =  LOG  +  LOG  +  X  +  2  +  LOG  +  LOG  +  Y  +  2. 
The  arithmetic  function,  for  simplicity,  will  be  regarded 
as  a  subscripted  variable  by  the  assumption  that  no  informa- 
tion about  it  can  be  known.   It  is  hoped  that  some  future 
effort  will  be  payed  to  this  area. 

In  a  block  outside  a  DO  statement  one  problem  should  be 
mentioned.   Consider  the  case  where  an  element  of  an  array 
is  involved  in  a  statement,  but  the  subscript  expressions 
cf  that  element  are  the  results  of  some  previous  calculations, 
The  conservative  point  of  view  is  that  the  subscript  expres- 
sions must  be  calculated  before  they  are  used  in  the  suc- 
ceeding statement.   But  with  the  assumption  that  the 
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calculation  of  the  subscript  expressions  can  be  performed 
by  the  control  unit,   this  problem  thus  can  be  solved  and 
the  sequential  inter-statement  relation  will  be  changed  into  a 
parallel  inter-statement  relation. 

The  input  and  output  statements  are  omitted  here. 
Clearly  the  variables  appearing  in  READ  statements  can  be 
regarded  as  input  variables  of  a  block,  and  variables 
appearing  in  WRITE  statements  can  be  regarded  as  output 
variables  of  a  block.   Essentially,  the  memory  locations 
associated  with  the  variables  in  the  input  statement  are 
modified  first,  then  fetched  in  the  succeeding  statements; 
the  memory  locations  associated  with  the  variables  in  the 
output  statement  are  exactly  in  the  reverse  direction. 
This  case  should  be  added  sometime. 
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4.   IMPLEMENTATION  OF  THE  TREE  HEIGHT  REDUCTION  ALGORITHM 

A  program  analyzer  which  takes  a  real  FORTRAN  applica- 
tion program  to  analyze  for  a  parallel  machine  has  been  pro- 
grammed in  PL/1.   The  analyzer  not  only  measures  potential 
parallelism  in  a  program  but  also  modifies  it  if  necessary 
to  extract  more  parallelism.   It  consists  of  the  master- 
program*,  the  DO- loop  subprogram*  and  the  assignment  state- 
ments subprogram  —  the  tree  height  reduction  algorithm  on 
a  block  of  assignment  statements.   The  tree  height  reduction 
algorithm  will  be  described  in  two  parts--back  substitution 
and  recursion,  and  tree  height  reduction  on  an  arithmetic 
expression.   The  interface  between  the  masterprogram  and  the 
assignment  statements  subprogram,  and  the  interface  between 
the  DO-loop  subprogram  and  the  assignment  statements  sub- 
program are  also  described  in  this  section. 
4. 1   Back  Substitution  and  Recursion 

In  this  part  the  sets  of  input  variables  and  output 
variables,  the  connectivity  matrices  representing  inter- 
statement  relations  in  the  original  block  and  the  block 
obtained  after  back  substitution,  and  other  tables  related 
to  recursion  are  generated. 

Both  the  input  variable  set  and  the  output  variable 
set  contain  one  entry  for  a  variable.   Each  entry  contains 
the  identifier  and  the  type-single  variable  or  subscripted 


*  The  masterprogram  is  described  in  [8].   The  DO-loop  sub- 
program is  described  in  [6], 
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variable.   To  each  entry  there  is  a  corresponding  local 
statement  number  list  that  contains  entries  for  statements. 
The  output  variable  set  is  generated  before  the  input  variable 
set  is.   We  scan  the  left-hand  side  of  equal  signs  of  assign- 
ment statements  in  sequence.   If  the  left-hand  side  of  an 
equal  sign  happens  to  be  a  new  variable,  the  internal  form 
of  identifier  of  that  variable  is  entered.   If  it  is  a  single 
variable  the  entry  '0'  is  entered.   If  it  is  a  subscripted 
variable  the  entry  ' 1'  is  entered.   In  the  case  of  a  match, 
the  local  statement  number  of  the  succeeding  statement  must 
be  entered  into  the  statement  number  list  associated  with 
that  output  variable.   As  for  two  subscripted  variables, 
both  the  identifier  and  the  coordinates  must  be  the  same  when 
they  are  matched.   After  the  left-hand  sides  of  all  assignment 
statements  in  a  block  have  been  scanned,  the  sequences  of 
entries  in  the  statement  number  lists  are  reversed. 

The  input  variable  set  and  the  connectivity  matrices 
for  the  original  block  and  the  block  obtained  after  back 
substitution  can  be  generated  concurrently.   We  scan  the 
right-hand  sides  of  equal  signs  of  assignment  statements  in 
sequence.   All  the  new  input  variables  must  be  entered  into 
the  input  variable  set.   When  one  input  variable  is  found, 
the  comparison  between  it  and  the  entries  in  the  output 
variable  set  must  be  done.   If  an  input  variable  happens  to 
be  the  same  as  one  of  the  output  variables,  and  the  local 
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statement  number  of  the  statement  where  the  particular 
output  variable  appears  is  less  than  that  of  the  statement 
where  the  input  variable  appears  then  back  substitution 
should  be  done.   In  the  original  block  the  indication  that 
the  first  operation  involving  it  is  a  store  and  the  succeed- 
ing operation  involving  it  is  a  fetch  must  be  recorded  by 
entering  '1'  into  the  connectivity  matrix  for  the  original 
block.   The  entry  is  made  by  regarding  the  local  statement 
number  of  the  assignment  statement  being  processed  as  the 
column  index,  and  the  local  statement  number  of  a  previous 
statement  as  the  row  index.   Consider,  for  example,  a  block 
of  four  assignment  statements 

A  =  B  *  C 

C  =  D  +  G 

E  =  B  *  C 

F  =  A 
The  output  variable  set  and  the  input  variable  set  are  shown 
as  follows: 

Output  variable  set  Input  variable  set 

Identifier       Type  Identifier     Type 
A               0  B  0 

C  0  C  0 

E  0  D  0 

F  0  G  0 
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The  statement  number  list  associated  with  each  output  variable 

is  shown  in  the  following: 

Identifier       Statement  number  list 

A  1 

C  2 

E  3 

F  4 

The  block  obtained  after  back  substitution  contains 

A  =  B  *  C 

C  =  D  +  G 

E  =  B  *  (D  +  G) 

F  =  A 

The  connectivity  matrices  for  the  original  block  and  the 

block  obtained  after  back  substitution  are : 

Connectivity  matrix  for  „      ,        .  *_  ■ 

..     •  •   i  v.i   i  Connectivity  matrix  for 

the  original  block  .,        _-*    ,  ,_ 

the  transformed  block 


1 

2 

3 

4 

1 

0 

0 

0 

1 

2 

0 

0 

1 

0 

3 

0 

0 

0 

0 

4 

0 

0 

0 

0 

1 

2 

3 

4 

1 

0 

0 

0 

1 

2 

0 

0 

0 

0 

3 

0 

0 

0 

0 

4 

0 

0 

0 

0 

A  block  of  assignment  statements  in  a  DO  statement  is 
handled  as  described  in  Section  3.   In  addition  to  the  tables 
built  before,  two  new  tables  are  needed  to  be  built.   One  is 
the  distance  table,  the  other  is  the  recursion  table.   The 
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entries  in  both  tables  are  associated  with  the  input  (i.e., 
right-hand  side)  variables.   In  order  to  compare  an  input 
variable  and  an  output  (i.e.,  left-hand  side)  variable, 
first,  their  identifiers  are  compared  if  both  are  subscripted 
variables.   In  the  case  their  identifiers  are  the  same,  then 
we  compare  their  subscript  expressions.   Assume  both  variables 
are  one-dimentional  arrays,  and  the  subscript  expressions  are 
linear  forms.   We  discuss  the  general  method  in  terms  of  the 
example  shown  below. 

First  each  input  variable   subscript  is  subtracted 
from  the  output  variable  subscript.   If  the  difference  is 
positive,  the  difference  is  entered  in  the  distance  table. 
Otherwise  we  enter  a  zero  in  the  distance  table.   Thus 
I  -  (I  -  1)  leads  to  the  ' 1'  entries. 

The  local  statement  number  of  the  assignment  state- 
ment where  the  output  variable  appears  is  entered  into  the 
recursion  table  with  the  row  index  indicating  the  current 
assignment  statement,  and  the  column  index  indicating  the 
occurrence  number  of  the  input  variable  appearing  on  the 
right-hand  side  of  the  egual  sign.   A  block  of  two  assign- 
ment statements  inside  a  DO  statement  is  illustrated: 

DO  20  I  =2,  10  Local  statement  number 

A(I)  =  A(I  -  1)  +  B(I  -  1)  1 

20    B(I)  =  I  +  A  (I  -  1)  2 
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Both  the  distance  table  and  the  recursion  table  are  repre- 
sented as  follows: 

Distance  Table  Recursion  Table 

position     12    3    4    5     position     12    3    4 


St. 

no . 

1 

1 

1 

0 

0 

0 

St. 

no. 

1 

1 

2 

0 

0 

0 

St. 

no. 

2 

0 

1 

0 

0 

0 

St. 

no. 

2 

0 

1 

0 

0 

0 

The  entries  in  the  distance  table  and  the  recursion  table  are 
used  to  determine  which  statement  at  which  previous  iteration 
has  to  be  substituted  into  at  the  current  iteration. 
4. 2   Tree  Height  Reduction  in  an  Arithmetic  Expression 

The  presented  form  of  an  arithmetic  expression  in  the 
original  block  or  in  the  block  obtained  after  back  substitution 
must  be  transformed  into  a  standard  form  before  it  can  be 
processed  by  the  tree  height  reduction  algorithm.   The  rules 
to  be  used  to  omit  redundant  parentheses  to  make  the  presented 
form  of  an  arithmetic  expression  into  a  standard  form  are 
quite  similar  to  the  rules  of  ordinary  mathematical  notation. 
The  precedence  of  arithmetic  operators  and  three  rules  are 
listed  as  follows : 

The  precedence  of  arithmetic  operators 

precedence  1    :    '+'  ,  '-' 

precedence  2    :    '*'  ,  '/' 
Rule  1:   The  outermost  parentheses  must  be  omitted. 
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Rule  2:   If  only  the  operators  of  precedence  2  appear  inside 
a  parenthesis  pair,  the  parentheses  must  be  omitted. 
If  the  parenthesis  pair  is  preceded  by  '/'  then  we 
must  properly  adjust  the  operators  inside  the 
parentheses. 
Rule  3:   If  the  precedence  of  operators  outside  both  ends  of 

a  parenthesis  pair  are  not  higher  than  the  precedence 
of  those  operators  inside  the  parentheses,  the  paren- 
theses must  be  omitted.   If  the  parenthesis  pair  is 
preceded  a  '-'  then  we  must  properly  adjust  the 
operators  inside  the  parentheses. 
The  internal  representation  of  an  arithmetic  expression 
is  a  string  of  integers.   The  numbers  used  to  stand  for  operands 
are  small  integers;  those  used  to  stand  for  operators  are  large 
integers.   In  accordance  with  the  number  of  parenthesis  pairs 
of  an  arithmetic  expression,  the  arithmetic  expression  in 
standard  form  is  decomposed  into  an  array.   One  row  represents 
an  arithmetic  expression  within  a  parenthesis  pair.   Negative 
integers  are  used  to  stand  for  an  arithmetic  expression  within 
a  parenthesis  pair,  the  absolute  value  of  it  points  to  the  row 
where  the  referenced  arithmetic  expression  is  stored. 

For  each  arithmetic  expression  the  number  of  levels 
and  the  number  of  operations  involved  for  each  level  are 
calculated  after  possible  distributions.   When  one  term  of 

this  form  Y   V .  is  found,  it  can  be  regarded  as  a  single 
li  ' 
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term  T  .  if  its  length  is  a  power  of  two  or  a  product  of 

terms  TT . .  each  of  whose  length  is  a  power  of  two.   When 
3    3 

a  term  of  the  form  ir  v.*ir    (T  .  )  is  found,  hole-filling 

distribution  is  tested  on  the  numerator  and  the  denominator 

separately.   The  detection  of  hole-filling  distribution 

between  the  numerator  and  the  denominator  then  follows. 

If  the  condition  is  satisfied,  the  hole-filling  distribution 

should  be  done.   The  number  of  terms  within  the  current 

arithmetic  expression  being  processed  will  be  increased  by 

the  deletion  of  an  arithmetic  expression  of  higher  level 

over  which  distribution  has  been  done. 

After  the  detection  of  hole-filling  distribution  on 

each  term  has  been  made,   we  have  an  arithmetic  expression 

in  the   form,  ?T.  .   If  one  term  T.  is  in  the  form  TV.*  ir(T  .  )  , 
'  k  k  k  l  l  3       3 

the  space- filling  distribution  needs  to  be  detected.   (One 
remark  should  be  made  concerning  distribution.   The  distribu- 
ting term  is  distributed  over  each  term  in  parentheses;  but 
distribution  with  folding  is  not  implemented  in  the  distribution 
algorithm.)   At  last,  the  computation  is  performed  to  determine  the 
number  of  levels  and  the  number  of  operations  for  each  level 
of  a  tree  for  each  term.   Essentially  we  don't  build  a  tree 
structure  for  an  arithmetic  expression.   Instead,  we  adopt 
a  mechanism  of  recording  the  sequence  of  combining  two  terms 
into  one  term  of  higher  level.   The  number  of  free  nodes  can 
be  computed  accordingly.   This  approach  will  allow  us  to 
assign  an  operation  to  the  level  just  before  its  result  is 
needed.   If  we  are  going  to  represent  the  information  computed 
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in  terms  of  tree  structure,  the  tree  constructed  in  the 
latest-stage  as  [5]  will  be  built.   Consider  an  arithmetic 
expression  of  level  1, 

A  =  B  +  C+D*E*F=T+T+T 

If  we  represent  the  procedures  of  the  mechanism  used  in  terms 
of  constructing  a  tree,  we  have  the  following  tree  structure: 

B  +  C  +  D*E*F 


The  sequence  of  combining  two  subtrees  into  a  tree  of  higher 
level  can  be  shown  in  this  equivalent  table: 


Temporary         Left-hand         Right-hand 
result  operand  operand 


eft-hand 

operand 

Ti 

T4 

m  «_  m  m 

4  ll  l2 

Tc        <-         T.  T_ 

5  4  3 

If  we  assign  each  node  where  a  temporary  result  to  be 
generated  a  weight,  1.   Then  the  term  D  *  E  *  F  will  have 
the  weight,  1,  and  both  B  and  C  have  the  same  weight,  2. 
Obviously,  the  number  of  free  nodes  for  both  B  and  C  is  1, 
and  is  0  for  D  *  E  *  F.   The  preferred  tree  structure  is 
shown  as  follows : 
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Level  B+C+D*E*F  No.    of   operations 

3  \      /  1 

2  \/        y  2 

1  ^s%"^  1 

The  same  procedures  for  an  arithmetic  expression  of  any 
level  can  be  applied  recursively  until  an  arithmetic 
expression  of  level  1  is  reached. 
4. 3   Interface 

The  assignment  statements  subprogram  is  one  part  of 
the  analyzer  which  consists  of  three  subprograms.   With 
respect  to  the  others, the  assignment  statements  subprogram 
functions  as  a  service  subprogram.   In  order  to  integrate 
these  three  subprograms  the  interface  part  is  described  here. 

As  a  block  of  assignment  statements  is  formed  in  the 
masterprogram  or  the  DO- loop  subprogram,  different  code 
numbers  are  needed  to  distinguish  them.   In  this  analyzer 
the  code  '0'  indicates  a  call  issued  from  the  masterprogram, 
the  code  ' 1'  indicates  a  call  issued  from  the  DO-loop  sub- 
program.  According  to  the  calling  mechanism,  the  count  of 
assignment  statements  and  their  global  statement  numbers  in  a 
program  are  dispatched  to  the  assignment  statements  sub- 
program, when  a  call  of  the  assignment  statements  subprogram 
is  issued.  The  integers  standing  for  the  index  variable, 
the  starting  variable,  the  ending  variable  and  the  increment 
variable  are  transfered.   In  addition,  the  original  block  of 
assignment  statements  must  be  referenced  each  time  by  the 
assignment  statements  subprogram. 
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APPENDIX  A 

Overall  flow-chart  of  the  tree  height  reduction  algorithm  for 
a  block  of  assignments  outside  a  DO  statement. 


nter  ) 


Back  Substitution 


Tree  height  reduction  on 
an  arithmetic  expression 


Hu ' s  algorithm  on 
a  single  tree 


(      Return  y 


Overall  flow-chart  of  the  tree  height  reduction  algorithm 
for  a  block  of  assignment  statements  inside  a  DO  statement 


I      Enter   j 


Back  Substitution 


Tree  height  reduction  on 
an  arithmetic  expression 


I 


Hu ' s  algorithm  on 
a  single  tree 


No.  of  iteration 
(e.g.  10) 


Hu's  algorithm  on 
a  set  of  trees 
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Return 
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Overall  flow-chart  of  the  tree  height  reduction  algorithm 
in  an  arithmetic  expression. 


I      Enter   J 


Decomposition  of 
an  arithmetic  expression 
by  parenthesized  terms 


Distribution  algorithm 


Composition  of 
parenthesized  terms 
back  into  an  arithmetic 
expression 


Printout  information 

of  a  tree  for  an 
arithmetic  expression 


/  Return  J 


The  list  of  functions  of  procedures  in  back  substitution 
and  recursion  algorithm. 


SLHS :   Generation  of  the  output  variable  set  of  a  block 
of  assignment  statements. 

SUBST:   Generation  of  the  input  variable  set  of  a  block  of 
assignment  statements.   Generation  of  a  block  of 
assignment  statements  obtained  after  back  substitu- 
tion from  an  original  block. 
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RECUR:   Issue  a  call  to  tree  height  reduction  algorithm. 

Apply  Hu ' s  algorithm  on  a  single  tree.   Operation 

of  iterative  case.   Apply  Hu ' s  algorithm  on  a  set 

of  trees. 
COMP:    Generation  of  precedence  partitions.   Use  of  Hu ' s 

algorithm.   Printout  of  final  result. 
The  list  of  function  of  procedures  in  the  tree  height 
reduction  algorithm  in  an  arithmetic  expression. 
STAND:   Standardization  of  a  presented  arithmetic  expression 

into  an  arithmetic  expression  in  standard  form. 
DASEM:   Decompsition  of  an  arithmetic  expression  by 

parenthesized  terms. 
PAREN :   Generation  of  one  term  and  one  parenthesized  term. 

Issue  calls  to  procedures  HOLES  and  SPACES. 
FETCH:   Fetch  of  the  information  of  the  tree  for  a  paren- 
thesized term. 
DENO:    Detection  of  the  hole-filling  distribution  in  the 

denominator. 
NUME :    Detection  of  the  hole-filling  distribution  in  the 

numerator. 
HOLES:   Computation  of  the  information  of  a  tree  for  one 

term. 
SPACES:  Detection  of  the  space-filling  distribution. 

Computation  of  the  information  of  a  tree  for  a 

parenthesized  term. 
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COMMOM:   Determination  of  the  deletion  of  a  parenthesized 

term  over  which  a  distribution  has  been  done. 
CALCU:    Calculation  of  the  number  of  tree  nodes  for  each 

term  in  an  arithmetic  expression. 
MOVE:     Transfer  of  the  information  of  each  term  in  a 

parenthesized  term  to  an  arithmetic  expression 

being  processed. 
DELET:    Deletion  of  the  index  number  of  a  parenthesized 

term  over  which  a  distribution  has  been  done  from 

the  index  table. 
PUSH:     Store  of  terms  into  a  service  array. 
ASEM :     Composition  of  the  parenthesized  terms  back  to  an 

arithmetic  expression. 
MEASURE:  Print  out  the  information  of  a  tree  for  an 

arithmetic  expression. 
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